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ESTIMATION OF A K-MONOTONE DENSITY: LIMIT 
DISTRIBUTION THEORY AND THE SPLINE 
CONNECTION 

By Fadoua Balabdaoui 1 and Jon A. Wellner 2 

University of Goettingen and University of Washington 

We study the asymptotic behavior of the Maximum Likelihood 
and Least Squares Estimators of a fc-monotone density go at a fixed 
point xq when k > 2. We find that the jth derivative of the estimators 
at xo converges at the rate jj~ (* _ ->)/( 2 *+ 1 ) for j = 0, . . . , k — 1. The 
limiting distribution depends on an almost surely uniquely defined 
stochastic process Hk that stays above (below) the fc-fold integral 
of Brownian motion plus a deterministic drift when k is even (odd) . 
Both the MLE and LSE are known to be splines of degree k — 1 with 
simple knots. Establishing the order of the random gap t£ — t~ , 
where t„ denote two successive knots, is a key ingredient of the proof 
of the main results. We show that this "gap problem" can be solved 
if a conjecture about the upper bound on the error in a particular 
Hermite interpolation via odd-degree splines holds. 

1. Introduction. 

1.1. The estimation problem and motivation. A density function g on 
]R + is monotone (or 1-monotone) if it is nonincreasing. It is 2-monotone 
if it is nonincreasing and convex, and k-monotone for k > 3 if and only if 
(— is nonnegative, nonincreasing and convex for j = 0, . . . , k — 2. 

We write T>^ for the class of all /c-monotone densities on R + and M.^ 
for the class of all /c-monotone functions (without the density restriction). 
Suppose that go £ T)^ and that X±, . . . ,X n are i.i.d. with density go- We 
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write G n for the empirical distribution function of X\ , . . . , X n . Our main 
interest is in the Maximum Likelihood Estimators (or MLE's) g n of go € T>k- 
When k = 1, it is well known that the maximum likelihood estimator g n 
of go € T>\ is the Grenander [14] estimator, that is, the left derivative of the 
least concave majorant G n of G n , and if <?o( x o) < with g' continuous in a 
neighborhood of xq, then 

(1.1) n 1 /3(^( Xo )_ ffo(2;o) )4(I 9o (x )b (xo)|) 1/3 2Z, 

where 2Z is the slope at zero of the greatest convex minorant of two-sided 
Brownian motion +i 2 , t G R; see Prakasa Rao [35], Groeneboom [15] and 
Kim and Pollard [24]. 

When k = 2, Groeneboom, Jongbloed and Wellner [18] considered both 
the MLE and LSE and established that if the true convex and nonincreasing 
density go satisfies g'o(xo) > (and g' Q ' is continuous in a neighborhood of 
xo), then 

(n^(g n (xo)-go(xo))\± ( (^o 2 (^o)5o'^o)) 1/5 ^ (2) (0) \ 
^ W/Hg' n (xo)-9'(x ))) ^^(^^(O)] ' 

where g n is either the MLE or LSE and H is a random cubic spline function 
such that H ^ is convex and H stays above integrated two-sided Brownian 
motion +i 4 , t £ R, and touches exactly at those points where H^ 2 > changes 
its slope; see Groeneboom, Jongbloed and Wellner [17]. 

Our main interest in this paper is in establishing a generalization of the 
pointwise limit theory given in (1.1) and (1.2) for general fcsN, k > 1. 

Beyond the obvious motivation of extending the known results for k = 1 
and k = 2 as listed above, there are several further reasons for considering 
such extensions: 

(a) Pointwise limit distribution theory for natural nonparametric estima- 
tors of the piecewise smooth regression models of smoothness k considered 
by Mammen [29] is only available for k E {1,2}. Similar models (with just 
one element in the partition) have been proposed for software reliability 
problems by Miller and Sofer [33]. Similarly, pointwise limit distribution 
theory is still lacking for the locally adaptive regression spline estimators 
considered by Mammen and van de Geer [30]. 

(b) The classes of densities have mixture representations as scale mix- 
tures of Beta(l, k) densities: as is known from Williamson [43] (see also Levy 
[26], Gneiting [13] and Balabdaoui and Wellner [2]), g € if and only if 
there is a distribution function F on (0, oo) such that 

(1.3) 9(x) = J o ^(y-x) k + ~ 1 dF(y) = j Q w[l-—j dF(w), 
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where z + = zl{z > 0} and F = F{k/-). The second form of the mixture 
representation in the last display makes it clear that the limiting class of 
densities as k — > oo, namely Poo is the class of scale mixtures of exponential 
distributions. In view of Feller [11], pages 232-233, this is just the class of 
completely monotone densities; see also Widder [42] and Gneiting [12]. To 
the best of our knowledge, there is no pointwise limit distribution theory 
available for the MLE in any class of mixed densities based on a smooth 
mixing kernel, including this particular case in which the kernel (or mixture 
density) is the exponential scale family as studied by Jewell [22]. On the 
other hand, maximum likelihood estimators in various classes of mixture 
models with smooth kernels have been proposed in a wide range of applica- 
tions including pharmacokinetics (Mallet [27], Mallet, Mentre, Steimer and 
Lokiec [28] and Davidian and Gallant [6]), demography (Vaupel, Manton 
and Stallard [41]) and shock models and variations in hazard rates (Harris 
and Singpurwalla [20], McNolty, Doyle and Hansen [31] and Hill, Saunders 
and Laud [21]). 

(c) The whole family of mixture models T>k corresponding to k € (0, oo) 
in (1.3) might eventually be of some interest, especially since the family of 
distributions corresponding to the classical Wicksell problem is contained in 
the class see, for example, Groeneboom and Jongbloed [16]. 

(d) The subclass of fc-monotone densities with mixing distribution F sat- 
isfying g( fe_1 )(0) = kl Jq 00 y~ k dF(y) < 00 can be regarded as the class of dis- 
tributions arising in a generalization of Hampel's bird-watching problem 
(Hampel [19]), in which birds are captured k times, but only one "inter- 
catch" time is recorded. Based on those observed intercatch times, the goal 
is to estimate the true distribution F of the resting times Y of the migrating 
birds, which we assume to have a density / with kth. moment ^k{f) < 00 • 
Furthermore, we assume that the time points of capture form the arrival time 
points of a Poisson process with rate A, that given Y = y, the number of 
captures by time y is Poisson(Ay) with A small enough so that exp(— Ay) ~ 1 
and that the probability of catching a bird more than k times is negligible 
(see also Hampel [19] and Anevski [1]). If S^i denotes the elapsed time 
between the first and second captures (the only observed intercatch time), 
then it follows by a derivation analogous to Hampel's that the density of the 
time 5fc 5 i is given by 



which is clearly A:-monotone. We obtain F, the probability distribution of 
Y, by inverting the previous mixture representation, that is, 




F(t) = 1 



< 7 ( fc - 1 )(o+) 
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at any point of continuity t > of F. 

In connection with (a), it is interesting to note that the definition of the 
family T>^ is equivalent to g € T>\. if and only if (— (where g^" 1 ^ 
is either the left or right derivative of g^ k ~ 2 ^) is nonincreasing. This follows 
from Lemma 4.3 of Gneiting [13] since Gneiting's condition lima;— »oo g( x ) = 
is automatic for densities. Thus the equivalent definition of T>^ has a natural 
connection with the work of Mammen [29] in the nonparametric regression 
setting. In parallel to the treatment of convex regression estimation given 
by Groeneboom, Jongbloed and Wellner [18], it seems clear that pointwise 
distribution theory for nonparametric least squares estimators for the regres- 
sion problems in (a) could be developed if adequate theory were available 
for the Maximum Likelihood and Least Squares estimators of densities in 
the class T>k, so we focus exclusively on the density case in this paper. In 
Section 5, we comment further on the difficulties in obtaining corresponding 
limit theory for the smooth kernel cases discussed in (b). 

1.2. Description of the key difficulty: the gap problem. The key result 
that Groeneboom, Jongbloed and Wellner [18] used to establish (1.2) is 
that r n — t ~ = O p (n -1 / 5 ) as n — > co, where r~ and r+ are two successive 
jump points of the first derivative of g n in the neighborhood of xq. Such a 
result was already proved by Mammen [29] (see Lemma 8) in the context of 
nonparametric regression, where the true regression curve, m, is piecewise 
concave/convex or convex/concave such that m is twice continuously differ- 
entiable in the neighborhood of xq, and m"(xo) ^ 0. Furthermore, Mammen 
[29] conjectured the right form of the asymptotic distribution of his Least 
Squares estimator, which was later established by Groeneboom, Jongbloed 
and Wellner [18]. 

To obtain the stochastic order n _1//5 for the gap, Groeneboom, Jongbloed 
and Wellner [18] used the characterizations of the estimators, together with 
the "midpoint property" which we review in Section 4. For k = 1, the same 
property can be used to establish that n -1 / 3 is the order of the gap. As a 
function of k, it is natural to conjecture that 

„-l/(2*+l) is 

the general form of 

the order of the gap. In the problem of nonparametric regression via splines, 
Mammen and van de Geer [30] have conjectured that 

„-l/(2fc+l) is 

the order 

of the distance between the knot points of their regression spline rh under 
the assumption that the true regression curve mo satisfies our same working 
assumptions, but the question was left open (see Mammen and van de Geer 
[30], page 400). In this paper, we refer to the problem of establishing the 
order of r+ — r~ as the gap problem. 

In Section 4, we show that when k > 2, the gap problem is closely related 
to a "nonclassical" Hermite interpolation problem via odd-degree splines. To 
put the interpolation problem encountered in the next section in context, 
it is useful to review briefly the related complete interpolation problem for 
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odd-degree splines which is more "classical" and for which error bounds 
uniform in the knots are now available. Given a function / E C^ _1 ^[0,1] 
and an increasing sequence = yo < y\ < ■ • ■ < y rn < y m +i = 1> where m > 1 
is an integer, it is well known that there exists a unique spline, called the 
complete spline and denoted here by Cf, of degree 2k — 1 with interior knots 
i/i, . . . , y m that satisfies the 2k + m conditions 

{Cf){Vi)= f{Vi), i = l,...,m, 
(Cf) {l) (y ) = f®(yo), (C ff\y m+1 ) = f®(y m +i), l = 0,...,k — l; 

see Schoenberg [36], de Boor [8] or Niirnberger [34], page 116, for further 
discussion. If j E {0, . . . , k} and / E C( fc+J )[0, 1], then there exists c^j > 
such that 

(1-4) sup Wf-CfUKckjWf^W^ 

0<yi <-<y m < 1 

For j = k, this "uniform in knots" bound in the complete interpolation prob- 
lem was first conjectured by de Boor [7] for k > 4 as a generalization that 
goes beyond k = 2, 3 and 4, for which the result was already established 
(see also de Boor [8]). By a scaling argument, the bound (1.4) implies that 
if / E C^ 2k ^[a,b],a < b E R, then the interpolation error in the complete in- 
terpolation problem is uniformly bounded in the knots and the bound is 
of the order of (6 — a) 2k . One key property of the complete spline inter- 
polant Cf is that (Cfp k ' is the Least Squares approximation of f^ when 
f^ E L2QO, 1]), that is, if <Sfe(yi, . . . ,y m ) denotes the space of splines of order 
k (degree k — 1) and interior knots y\, . . . , y m , then 

(1.5) / ((C/) (fe) - f k \x)fdx= min / (S(x) - f ik) (x)f dx 
JO SeS k (yi,...,ym)Jo 

(see, e.g., Schoenberg [36], de Boor [8], Niirnberger [34]). Consequently, if 
denotes the space of bounded functions on [0, 1] , then the properly defined 
map 

cW[o,i]^s k (y), 
f {k) ^{Cft\ 

where y = (y%, . . . ,y m ), is the restriction of the orthoprojector Ps k ( y ) from 

Loo to Sk(y) with respect to the inner product (g, h) = Jq 1 g{x)h{x) dx which 
assigns to a function g E the k-th derivative of the complete spline in- 
terpolant of any primitive of g of order k (note that the difference between 
two primitives of g of order k is a polynomial of degree k — 1). 
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de Boor [8] pointed out that in order to prove the conjecture, it is enough 
to prove that 



sup||-Ps fe (y)lloo = SU P SU P 
y v geLoo iis/iioo 

is bounded. This was successfully achieved by Shadrin [38]. 

The Hermite interpolation problem which arises naturally in Section 4 ap- 
pears to be another variant of interpolation problems via odd-degree splines 
which has not yet been studied in the approximation theory or spline lit- 
erature. More specifically, if / is some real- valued function in [0, 1] for 
some j > 1 and = yo < y\ < ■ ■ ■ < y2k~i < 2/2A.--3 = 1 is a given increasing 
sequence, then there exist a unique spline H k f of degree 2k — 1 and interior 
knots yi, . . . , 7/2fc— 4 satisfying the 4k — 4 conditions 

(1.6) (H k f)(Vi) = f(yi) and {H k f)\ yi ) = /'(</*), i = 0, . . . , 2k - 3. 

It turns out that deriving the stochastic order of the distance between two 
successive knots of the MLE and LSE in the neighborhood of the point of 
estimation is very closely linked to bounding the error in this new Hermite 
interpolation independently of the locations of the knots of the spline inter- 
polant. More precisely, if gt(x) = (x — t)5. -1 /(k — 1)! is the power truncated 
function of degree k — 1 with unique knot t, then we conjecture that there 
is a constant d& > such that 

(1-7) sup sup \\gt -T-Ck9t\\oo <d k . 

*€(0,l)0<yi<-<j/ 2 fc-4<l 

As shown in Balabdaoui and Wellner [3], the preceding formulation implies 
that boundedness of the error uniformly in the knots of the spline interpolant 
holds true for any / € C^ k+ ^ , that is, 

SUP ||/-H fe /||oc<4j||/ (i!+i) |loo- 
0<l/l<-<l/2*-4<l 

If j = k and 1 1 y ( 2A: ) 1 1 00 < i ; it follows from Proposition 1 of Balabdaoui and 
Wellner [3] that the interpolation error must be bounded above by the error 
for interpolating the perfect spline, 

1 / 2k ~ 4 \ 

For a definition of perfect splines, see, for example, Bojanov, Hakopian and 
Sahakian [5], Chapter 6. Based on a large number of simulations, we found 
that 

2 

sup \\S* -HkS*\\oo < —— 

0<2/i<-<i/2fc-4<l 
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for fairly large values of k (see the last column in Table 2 in Balabdaoui and 
Wellner [3]). The latter strongly suggests that for / 6 C (2fe) [0, 1], we have 

(1-8) sup IIZ-WIIoo^t^II/^IL- 

Based on conjecture (1.7), we will prove that the distance between two 
consecutive knots in a neighborhood of xq is O p (n _1 ^ 2fc+1 ^). 

After a brief introduction to the MLE and LSE and their respective char- 
acterizations, we give in Section 3 a statement of our main result which gives 
the joint asymptotic distribution of the successive derivatives of the MLE 
and LSE. The obtained convergence rate n~^ k ~^^ 2k+1 ^ for the jth derivative 
of any of the estimators was found by Balabdaoui and Wellner [2] to be the 
asymptotic minimax lower bound for estimating Qq\xq), j = 0, . . . ,k — 1, 
under the same working assumptions. The limiting distribution depends on 
the higher derivatives of H^, an almost surely uniquely defined process that 
stays above (below) the (k — l)-fold integral of Brownian motion plus the 
drift {k\ / {2k)\)t 2k when k is even (odd) and whose derivative of order 2k — 2 
is convex [H/. is also said to be (2k — 2)-convex}. The process H/. is stud- 
ied separately in Balabdaoui and Wellner [2]. Proving the existence of Hj~ 
also relies on our conjecture in (1.7) since the key problem, also referred to 
as the gap problem, depends on a very similar Hermite interpolation prob- 
lem, except that the knots of the estimators are replaced by the points of 
touch between the (k — l)-fold integral of Brownian motion plus the drift 
[k\/{2k)\)t 2k and H k . For more discussion of the background and related 
problems, see Balabdaoui and Wellner [2]. For a discussion of algorithms 
and computational issues, see Balabdaoui and Wellner [2]. 

2. The estimators and their characterization. Let Xi, . . . ,X n be n in- 
dependent observations from a common fc-monotone density g$. We con- 
sider nonparametric estimation of g$ via the Least Squares and Maximum 
Likelihood methods, and that of its mixture distribution Fq, that is, the 
distribution function on (0, oo) such that 



fc-i 



9o(x) = (t 2 dF (t), x > 0. 
Jo t K 

In other words, go is a scale mixture of Beta(l, k) densities. The mixing dis- 
tribution is, furthermore, given at any point of continuity t by the inversion 
formula 

(2-1) Fo( i ) = E(-l)^Gj i) (i), 

3=0 J ' 
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where Go(t) = jQgo(x)dx. An estimator for F can be obtained by simply 

plugging in estimators of Gq = % , j = 0, . . . , k, in the inversion formula 
(2.1). We call estimation of the (mixed) /c-monotone density go the direct 
problem and estimation of the mixing distribution function Fq the inverse 
problem. For more technical details on the mixture representation and the 
inversion formula, see Lemma 2.1 of Balabdaoui and Wellner [2]. 

We now give the definitions of the Least Squares and Maximum Likelihood 
estimators; these were already considered in the case k = 2 by Groeneboom, 
Jongbloed and Wellner [18]. The LSE, g n , is the minimizer of the criterion 
function 



over the class Mk, whereas the MLE, g n , maximizes the "adjusted" log- 
likelihood function, that is, 



over the same class. In Balabdaoui and Wellner [2], we find that both estima- 
tors exist and are splines of degree k — 1, that is, their (k — l)st derivative is 
stepwise. Furthermore, as shown in Balabdaoui and Wellner [2], the LSE's 
and MLE's are characterized as follows: let H n and Y n be the processes 
defined for all x > by 





(2.2) 





and 



(2.3) 





Then the fc-monotone function g n is the LSE if and only if 




fc-l-(fe-l) 



'n 



Oh-)- 



For the MLE, we define the process 



(2.5) 
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for all x > and g € T> k . A necessary and sufficient condition for the k- 
monotone function g n to be the MLE is then given by 

, s - , v f < 1, for all x > 0, 

(2.6) H n (x,g n ) ( = lj . f ( _ 1)fe _ 1 ^-i) (:c _ ) < ( _ 1)fe -i^-D (x+) . 

These characterizations are crucial for understanding the local asymptotic 
behavior of the LSE and MLE. They were exploited in Balabdaoui and 
Wellner [2] to show uniform strong consistency of the estimators on intervals 
of the form [c, oo),c > 0. Here, they prove to be once again very useful for 
establishing the limit theory in both the direct and inverse problems. 



3. The asymptotic distribution. 



3.1. The main convergence theorem. To prepare for a statement of the 
main result, we first recall the following theorem from Balabdaoui and Well- 
ner [2] giving existence of the processes H k . 

Theorem 3.1. For allk> 1, let Y k denote the stochastic process defined 

by 



Y k (t) 



If conjecture (1.7) holds (see also the discussion in Balabdaoui and Wellner 
[2]), then there exists an almost surely uniquely defined stochastic process 
Hh characterized by the following four conditions: 

(i) the process stays everywhere above the process Y^: 

H k (t)>Y k (t), t 6 R; 

(ii) {—l) k H k is Ik-convex, that is, (—l) k H^ k 2 ^ exists and is convex; 

(iii) the process H k satisfies 

r {H k {t)-Y k (t))dH^ k - 1 \t) = Q- 

J — oo 

(iv) if k is even, lim| tHoo (#f j) (t) - Y^\t)) = for j = 0, . . . , (k - 
2)j2; if k is odd, lim^oo^i) - Y k {t)) = and lm| t |_ 00 (#£ y+1) (*) - 
Y^ +1) (t)) = 0forj = 0,...,(k-3)/2. 

We are now able to state the main result of this paper, which generalizes 
Theorem 6.2 of Groeneboom, Jongbloed and Wellner [18] for estimating 
convex (2-monotone) densities. 
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Theorem 3.2. Let xq > and go be a k-monotone density such that go 
is k-times differentiable at xq with (—l^g^fao) > and assume that g^ is 
continuous in a neighborhood of xq. Let g n denote either the LSE g n or the 
MLE g n and let F n be the corresponding mixing measure defined in terms of 
G n = / *<7 n (s)ds via (2.1). If conjecture (1.7) holds, then 



( nVW+Vfaixo) - go(xo)) \ 
n ( fc -i)/( 2fc+ i) ( ^(i) (xo) _ 9 (i) (xo)) 

{ nl/(2k+ i ){ -(k-i)( Xo) _ g (k-i) {xo)) ) 



( C (x 



)H { k k \0) \ 



Cl (x )< +1) (0) 



and 



n l/(2*+l)(F n (x )-F(xo)) 



kjt 



-l) k X 



where 



k\ 



t , w .{ (jl|lll) H(H)|W| 

for j = 0,...,k-l. 



2j+l~, l/(2fc+l) 



3.2. The key results and outline of the proofs. Our proof of Theorem 3.2 
proceeds by solving the key gap problem assuming that our conjecture (1.7) 
holds. This is carried out in Section 4 in which the main result is the follow- 
ing. 



Lemma 3.1. Let k>3 and g n denote either the LSE g n or the MLE g n . 
If 90 £ T^k satisfies g\p (xq) ^ and conjecture (1.7) holds, then T2k-3 — tq = 
O p (n~ 1 /( 2fc+1 ^), where tq < ■ ■ ■ < T2ks are 2k — 2 successive jump points of 
g^ ^ in a neighborhood of x$. 



Using Lemma 3.1, we can establish the rate(s) of convergence of the esti- 
mators g n and g n and their derivatives viewed as local processes in n -1 /( 2fe+1 ) 
neighborhoods of the fixed point xq. This is accomplished in Proposition 3.1. 
Once the rates have been established, we define for the LSE localized ver- 
sions Yj, oc ,£^ oc of the processes Y n , H n given in (2.2) and (2.3), respectively, 
and Y^ c , H^ c related to the process H n given in (2.5) in the case of the 
MLE. The proof then proceeds by showing that: 
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the localized processes Y£ c and Y£ c converge weakly to Y ar7 , where 

k\ 



Sfc-l 



J 

t >o, 

r 



W(si) ds\--- dsk-i + o(— 1) 



(7 



Sfc-l 

t <0, 



W(si)dsi 



(is 



fe-l 



+ a(-l) 



(2fc)! 
fc! 



2 A- 



(2fc)! 



2A: 



with er = \/g(xo), a = (— l) k g^\xo) / k\ , and where W is a two-sided Brow- 
nian motion process starting from 0; this can be shown by classical meth- 
ods from Shorack and Wellner [39] or alternatively via the strong approx- 
imation of Komlos, Major and Tusnady [25]; 
• the localized processes H l ° c and H x ° c satisfy Fenchel (inequality and equal- 
ity) relations relative to the localized processes Y^ c and Y^ c , respectively. 

We then show via tightness that the localized processes H^ c and H^ c 
(and all their derivatives up to order 2k — 1) converge to a limit process 
satisfying the conditions (i)-(iv) of Theorem 3.1 and hence the limit process 
in both cases is just (up to scaling by constants). When specialized to 
t = 0, this gives the conclusion of Theorem 3.2. 

The following is the key proposition concerning rates of convergence. 

Proposition 3.1. Fix xq > and let go be a k-monotone density 
such that (— l) k gQ k \xo) > 0. Let g n denote either the MLE g n or the LSE 
g n . If conjecture (1.7) holds, then for each M > 0, we have 



(3.1) 



sup 

\t\<M 



-l/(2fc+l) 



l=j 



O p (n-( fc -^/( 2fc+1 )) for j = 0, . . . , k - 1. 



For the LSE, we define the local Y n - and -ff n -processes by 



Y^ oc (i)=n 2fc /( 2fe+1 ) 



a . 0+fn -l/(2fc+l) i 



•I'd 



■'-■<> 



V2 



G n (vi) - G n (xo) 



Vl — 1 (u — xo^jj) 



x j=0 J- 



>, fc-1 

g ( 3) (x )du \ Y[ dy i 



i=l 



and 



H l : c {t)=n 2k ^ 2k+ ^ [ 

Jx 



X0+fn -l/(2fc+l) 

XQ 



l-Vk 


rv2 ( 




■ \9n{vi) 




JXQ { 



12 F. BALABDAOUI AND J. A. WELLNER 

^{vi-x y (j), A 

- 2^ n 9o {x ) >dvi--- dv k 

+ ^k-l,nt k ~ l + ^-k-2,nt k ~ 2 H + A\,nt + A),n, 

respectively, where 

n (2fc-i)/(2fc+l) _ 

A hn = - (H^(xo) - YW(xo)), j = 0, . . . , k - 1. 

Let rfc = I /(2k + 1). In the case of the MLE, the local processes Y l ° c and 
Hl? c are defined as 



Y l ° c (t) _ n2krk rxo+tn-rh p gp(tO ~ E J=0 > - XQ) 3 /j ] -9o(x ) 

9o(xq) Jxo Jxo Jx 9n(v) 

dv dvi ■ ■ ■ dvk-i 
rx +tn~ r k rv h -i rvi i 

+ n 2kr * / / •••/ __d(G n -Go)(t;)d«i---d«fc- 

Jxo Jxo Jxo 9n[ v ) 

and 

^n oc (*) _ n 2k rk r xo+tn ~ rh r*- 1 r - eSK" - zoy/rfM 



rxo+tn r k 


rvk-i 


rVi 


Ixo 


Jx 


Jxo 



go(xo) Jxo Jxo Jx 9n{v) 

dv dv\ ■ ■ ■ dvk-i 

+ Ak-i t nt k ~ 1 H h Ao,n, 

where 

n (2k-j)r k f~() (k— I)' 

A J> = ~ (fc - l)!j! ffo(xo) V g " ^ (Xo) ~ (fe - 7' 3 = 0,-..,k-l. 

In the following lemma, we will give the asymptotic distribution of the 
local processes Y^ c and Y^ c in terms of the (k — l)-fold integral of two-sided 

(k) 

Brownian motion, go{xo), an d <?o (xo) assuming that the true density go is 
fc-times continuously differentiable at xq. We denote by Yj^ c either Yj^ c or 

R n ■ 

Lemma 3.2. Let xq be a point where go is continuously k-times differ- 
entiable in a neighborhood of xo with (— l) k g ( k \xo) > 0. Then Y l ° c => Y aj(T 
in C[—K,K] for each K > where 

ft rSk—l f s 2 k^ 

a ■•■/ W(s 1 )ds 1 ---ds k - 1 + a(-l) k — -t 2k , t>0, 

Jo Jo Jo (2k)! 

(■0 rO rO L\ 
a / ••• / W(s l )ds 1 ---ds k - 1 + a{-l) k — -t 2k , t<0, 
Jt Js k _ 1 Js 2 (2fc)! 



Ya,a{t) 
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where W is standard two-sided Brownian motion starting at 0, a = \J go(xo) 
and a = {-l) k gl ) k \x )/kl. 

Now, let H x ° c denote either H l ° c or H l ° c . 

Lemma 3.3. The localized processes Y l ° c and H^ c satisfy 
H l ° c (t) - Y l ° c (t) > for all t > 0, 

with equality if xq + tn~ l ^ 2k+l ^ is a jump point of gn ^ • 
Lemma 3.4. The limit process Y a ^ a in Lemma 3.2 satisfies 

si \s 2 



where 

(3.2) s^^(tMM) 



(3.3) s 2 



fe g^(^o) y 2fc ~ i)/(2fc+i) 

Vgo(xo) V k\y/g (x ) J 

V^) N 2/(2fc+l) 

-l) k gf\x Q )/k\ 



To show that the derivatives of H l ° c are tight, we need the following 
lemma. 

Lemma 3.5. For all j G {0,...,k — 1}, let Aj^ n denote either Aj >n or 
Aj tU . If conjecture (1.7) holds, then 

(3.4) A jyn = O p (l). 

We now rescale the processes Y^ oc and H l ° c so that the rescaled Y^ oc 
converges to the canonical limit process defined in Lemma 3.4. Since the 
scaling of Yj^ c will be exactly the same as the one we used for Y^, we define 
H l n and Y l n by 

H l n (t) = Sl H l ° c ( S2 t), %(t) = Sl Y l : c (s 2 t), 
where si and s 2 are given by (3.2) and (3.3), respectively. 

Lemma 3.6. Let c> 0. Then 

(^)W, . . . , (H l J^) => (<U« • • • , Hf k ~ l) ) 
in (D[— c, c]) 2k , where is the stochastic process defined in Theorem 3.1. 
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To keep this paper to a reasonable length, proofs of the results of Section 
3.2 and of the main convergence Theorem 3.2 can be found in Balabdaoui 
and Wellner [4], Appendix 1. The arguments there are constructed along the 
lines of Groeneboom, Jongbloed and Wellner [18]. However, those arguments 
had to be adapted and further developed to be able to treat /c-monotonicity 
for an arbitrary integer k>2. In this general case, we found that it is very 
useful to consider perturbation functions to learn about the asymptotic be- 
havior of the estimators. Such perturbation functions need, of course, to be 
permissible, that is, the resulting perturbed function must belong to the 
/c-monotone class, but they also need to have a compact support to suit the 
local nature of the current estimation problem. It turns out that choices are 
rather limited and that B-splines with degree k — 1 and support [r ni i,T ni k+i], 
where r nj i, . . . ,T n ,k+i are knots of either the LSE or MLE in the neighbor- 
hood of xq, are found to be the most sensible perturbation functions to 
consider. For a definition of B-splines, see, for example, Niirnberger [34], 
Theorem 2.2. For technical details on the use of B-splines for constructing 
perturbations, see, for example, Proposition 6.1 in Balabdaoui and Wellner 
[4], Appendix 1. 

4. The gap problem — spline connection. Recall that it was assumed that 
go is fc-times continuously differentiable at xq and that (— l) k gQ k \xo) > 0. 
Under a weaker assumption, Balabdaoui and Wellner [2] proved strong con- 
sistency of the (k — l)st derivative of the MLE and LSE. This consistency 
result and the above assumptions collectively imply that the number of jump 
points of this derivative, in a small neighborhood of x$, diverges to infinity 
almost surely as the sample size n — ► oo. This "clustering" phenomenon is 
one of the most crucial elements in studying the local asymptotics of the 
estimators. The jump points then form a sequence that converges to xq al- 
most surely and therefore the distance between two successive jump points, 
for example, located just before and after xq, converges to as n — > oo. But 
it is not enough to know that the "gap" between these points converges to 
0: an upper bound for this rate of convergence is needed. 

To prove Lemma 3.1, we will focus first on the LSE because it is some- 
what easier to handle through the simple form of its characterization. The 
arguments for the MLE could be built upon those used for the LSE, but in 
this case one has to deal with some extra difficulties due to the nonlinear 
nature of its characterization. 

We start by describing the difficulties of establishing this result for the 
general case k > 2. 

4.1. Fundamental differences. Let t~ and r+ be the last and first jump 
points of the (k — l)st derivative of the LSE g n , located before and after 
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xq, respectively. To obtain a better understanding of the gap problem, we 
describe the reasoning used by Groeneboom, Jongbloed and Wellner [18] in 
order to prove that r+ — r, ^ = O p (n -1 / 5 ) for the special case k = 2. The 
characterization of the estimator is given by 



(4.1) H n {x) | 



>Y n (x), x>0, 

= Y n (x), if x is a jump point of g' n , 



where H n (x) = Jq(x — t)g n (t)dt and Y n (x) = Jq G n (t) dt. On the interval 
[t~,t„ ), the function g' n is constant since there are no more jump points in 
this interval. This implies that H n is a polynomial of degree 3 on [t~,t+). 
But from the characterization in (4.1), it follows that 

H n (jt) = Y n (r±), H' n {rt)=Y l n {rt). 

These four boundary conditions allow us to fully determine the cubic poly- 
nomial H Tl on [t~,t+]. Using the explicit expression for H n and evaluating 
it at the midpoint r = (r~ +r^")/2, Groeneboom, Jongbloed and Wellner 
[18] established that 

~ Yn(r-)+Y n (r+) G n (r+) - G w (r-) . 

Hnijn) = [t£ ~ T n ). 

Groeneboom, Jongbloed and Wellner [18] refer to this as the "midpoint 
property." By applying the first condition (the inequality condition) in (4.1), 
it follows that 

Yn(r-) + Y n (r+) G ra (r+) - G ra (r~) + _ 

2 g Vn r n ) — ^n\J n )- 

The inequality in the last display can be rewritten as 
y (T-) + y (T+) G (r+)-G (r-) + 

2^ g ( r n - T n ) > K n, 

where Go and Yq are the true counterparts of G n and Y n , respectively, 
and E n is a random error. Using empirical process theory, Groeneboom, 
Jongbloed and Wellner [18] showed that 

(4.2) |E n | = O p (n~^) + 0p ((r+ - r~) 4 ). 

On the other hand, Groeneboom, Jongbloed and Wellner [18] established 
that there exists a universal constant C > such that 

y (r-)+y (r+) Gq(t+)-G (t-) 

(4.3) 2 / 8 

= -Cg'o(x )(T+ - r n ) 4 + o p ((r+ - T n ) 4 ). 

Combining the results in (4.2) and (4.3), it follows that 

r+-r- = O p (n-y 5 ). 
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The problem has two main features that make the above arguments work. 
First, the polynomial H n can be fully determined on [t~,t+] and can there- 
fore be evaluated at any point between t~ and t+. Second, it can be ex- 
pressed via the empirical process Y n and that enables us to "get rid of" terms 
depending on g n whose rate of convergence is still unknown at this stage. 
We should also add that the problem is symmetric about f n , a property that 
helps in establishing the formula derived in (4.3). 

When k > 2, it follows from the characterization of the LSE given in (2.4) 

that for any two successive jump points of g£ , t~, t+, the four equalities 

H n {rt)=Y n {rt) and ^(r±) = Y' n (<r±) 

still hold. However, these equations are not enough to determine the poly- 
nomial H n , now of deg ree 2k — 1 , on the interval \r n , 1 . One would need 
2k conditions to be able to achieve this. [We would be in this situation if we 
had equality of the higher derivatives of H n and Y n at t~ and r+, i.e., 

(4.4) fl#>(r-) = YW(r-), ffW(r+)=Y«(r+) I 

for j = 0, . . . , k — 1, but the characterization (2.4) does not give this much.] 
Thus it becomes clear that two jump points are not sufficient to determine 
the piecewise polynomial H n . However, if we consider p > 2 jump points 
T n,o < • ■ • < (all located, e.g., after xq), then H n is a spline of degree 

2k — 1 with interior knots r ni i, . . . ,T n , p -2, that is, H n is a polynomial of 
degree 2k — 1 on (T n j,T n j + i) for j = 0, . . . ,p — 2 and is (2k — 2)-times differ- 
entiable at its knot points T n ,o, ■ ■ ■ ,r„ )P _i. In the next subsection, we prove 
that if p = 2k — 2, the spline H n is completely determined on [r n) o, T n ^k-z\ 
by the conditions 

H n (T n>i ) = Y n (r ni ) and H' n (T n>i ) = Y' n (r n <), 

(4.5) 

» = (),... ,2* -3. 

This result proves to be very useful for determining the stochastic order of 
the distance between two successive jump points in a small neighborhood of 
xq if conjecture (1.7) on the uniform boundedness of the error in the "non- 
classical" Hermite interpolation problem via splines of odd degree defined 
in (1.6) holds. 

4.2. The gap problem for the LSE — Hermite interpolation. In the next 
lemma, we prove that given 2k — 2 successive jump points r n> o < ■ ■ ■ < r n ^k-i, 
of gn i H n is the unique solution of the Hermite problem given by (4.5). 
In the following, we will omit writing the subscript n explicitly in the knots, 
but their dependence on the sample size should be kept in mind. 
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Lemma 4.1. The function H n characterized by (2.4) is a spline of degree 

~ (2k — 1) 

2k — 1. Moreover, given any 2k — 2 successive jump points of H n , tq < 
■ • ■ < T~2k-3, the {2k — l)st spline H n is uniquely determined on [To,T2k-z\ by 
the values of the process Y n and of its derivative Y' n at tq, . . . ,T2ks- 

Proof. We know that for any jump point r of H n 2k , we have 

H n (r)=Y n (r) and H' n (r) =Y' n (r). 

This can be viewed as a Hermite interpolation problem if we consider that the 
interpolated function is the process Y ra and that the interpolating spline is 
H n (see, e.g., Niirnberger [34], Definition 3.6, pages 108 and 109). Existence 
and uniqueness of the spline interpolant follows easily from the Schoenberg- 
Whitney-Karlin-Ziegler theorem (Schoenberg and Whitney [37], Theorem 
3, page 258; Karlin and Ziegler [23], Theorem 3, page 529; Niirnberger [34], 
Theorem 3.7, page 109; DeVore and Lorentz [9], Theorem 9.2, page 162). 
□ 

In the following lemma, we prove a preparatory result that will be used 
later for deriving the stochastic order of the distance between successive 
knots, to, ... , T2fc_3, of g n in a neighborhood of xq. Let TCk again denote the 
spline interpolation operator which assigns to each differentiable function / 
the unique spline Hk[f] with interior knots n, . .. ,T2k-4 and degree 2k — 1, 
and satisfying the boundary conditions given in (1.6). 

Lemma 4.2. Let f 6 Ui=o 4 ( r «' r *+i)- V e k{t) denotes the error at t of 
the Hermite interpolation of the function x 2k /(2k)l, that is, 

„2k -i 

(<). 



t 2k 



ek{t) = Wy.~ Hk 



x- 



(2k)\ 



then 

(4.6) g { k) (f)e k (f)<E n + R n , 

where E n defined in (4-8) is a random error and M n defined in (4-9) is a 
remainder that both depend on the knots To,. .. ,T%k-3 an d the point f . 

Proof. Let f 6 Ui=o 4 ( r «' r «+i)- F rom the characterization in (2.4) and 
the fact that H n = 7Yfc[Y n ] on [ro,r 2 fc_3], it follows that 

n k [Y n }(f)>Y n (f). 

Let Yq be the true counterpart of Y n , that is, Yq(x) = Jq(x — t) fc ~ 1 go(t) dt/{k- 
1)1. We can then rewrite the previous inequality as 

(4.7) 74[Yo](f)-y (T)>-E n (f), 
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where 

(4.8) E n = H k [Y n - Y ](f) - [Y n - Y ](f). 

Based on the working assumptions, the function Yq is (2fc)-times continu- 
ously differentiable in a small neighborhood of xq. Now, Taylor expansion 
of Yo(t) with integral remainder around f up to the order 2k yields 



2k-l 



Yo(t) = £ 

j=0 



T 2fc-3 



{t-u) 



2k-l 



(2k -iy. 



g ( k \u) du, 



for all t G [To,T2fc-3]. Using this expansion, along with the fact that the 
operator 7i k is linear and preserves polynomials of degree 2k — 1, we can 
rewrite the inequality in (4.7) as 

T^TT^yT «*[(* " ^mg^Hu) du > -En. 

In the previous display, 7i k [(t — u) 2 ^~ 1 ](f) is the Hermite spline interpolant 
of the truncated power function t \— ► (t — u) 2 ^^ 1 (u is fixed), evaluated at the 
point f . Now, we can rewrite the left-hand side of the previous inequality as 



T 2k-3 



1 



_-^H k {(t-u]f^](f)g$\u)du 



(4-9) 



(2k-l)\ 



n k [(t-u)^- l ]{f)du 



+ 



i 



(2fc-l)! 
% '(r)- 



T"2fc-3 



l k [(t-u)f-']{f){i\u)-g^(f))du 

T 2k-3 



[(t - uf*- 1 ] du 



T +. 



(2k-l)\ 

once again using linearity of the operator Ti k . The remainder M n is equal to 
the Hermite interpolant of the function 



1 



(t-u) 



2k-l 



(g { k) (u)-g^(f))d u 



(2fc-l)!^ (2k-l)\ 

at the point r. On the other hand, we can further rewrite the integral term 
in (4.9) as 



1 



(2k-l)\ 



Hi 



T 2k-3 



(t-u) 



2k-l 



du 



(f) 



1 



(2k-l)\ 
1 



(t-u) 



2k-l 



du 



(f) 



n k [(t-ff k \(f). 



(2k)\ 



A-MONOTONE DENSITY ESTIMATION 



19 



In other words, the integral term in (4.9) is nothing but the value of the 
Hermite spline interpolant of the function 1 1— > (t — f) 2k /(2k)\ at the point 
f . As claimed in the lemma, this value is also equal to — e k (f), where e k is 
the error of the Hermite interpolation of the function x 2k /{2k)\. Indeed, let 
Pzk-i(t) = (t- f) 2k /{2k)\ - t 2k /(2k)\. Since P 2k -i is a polynomial of degree 
2k — 1, we have 



ri k 



(x-t) 



2k 



(2k)\ 

If t = f, P 2fc _ 1 (f) = 0-f 2fc /(2fc)! 



(t) = n k 



„2k 



(t)+P 2k -l(t). 



(2k)\_ 

f 2k /(2k)l, which implies that 



rik 



(x 



\2k 



(2fc)! 



(f)=H k 



„2k 



(2k)\ 



(f) 



-.2k 



(2k)\ 



-efe(r). 



□ 



The error e k defined in Lemma 4.2 can be recognized as a monospline of 
degree 2k with 2k — 2 simple knots To,.. . ,T2k-3- For a definition of monos- 
plines, see, for example, Micchelli [32], Bojanov, Hakopian and Sahakian [5], 
Niirnberger [34], page 194, or DeVore and Lorentz [9], page 136. In the next 
lemma, we state an important property of e k . 



Lemma 4.3. The function x \— > e k {x) has no zeros other than tq, . 
in [ro,T 2 jfc-3]. Furthermore, (-l) k e k >0 on [r ,T 2 k-3]- 



,T~2k-3 



PROOF. See Balabdaoui and Wellner [4], Appendix 3. □ 

In Lemma 4.2, the key inequality in (4.6) can be rewritten as 
(4.10) (-l) k 9 { o k \f) ■ (-l) fc e fc (f) <E n + M n , 

where the first factor on the right-hand side is already known to be positive 
by /c-monotonicity of go. Lemmas 4.4 and 4.5 are the final steps toward 
establishing the order of the gap for the LSE based on conjecture (1.7). 

Lemma 4.4. If conjecture (1.7) holds, then E n in (4-6) of Lemma 1^.2 
satisfies 

|E„| = O p (n" fc /( 2fc+1 )) + 0p ((r 2fe _ 3 - r ) 2k ). 



Proof. We have 



H k [Y n -Y ](f)-[Y n -Y }(f). 

Using (generalized) Taylor expansions of Y n and Yq around the point f up 
to order k — 1 yields 



Y n (t)-y (t) = E 

3=0 



it 



f)-Y Q {j \f)) + 



(t 



(k-iy. 



■d(G n -G )(x), 
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1 



(t - xf- 1 d(G n - G ){x) 



(k-l)\ 

T 2k-3 



g t (x) d(G n - G ){x) 



(f) 

where gt(x) 



(t-x) 



k-l 



n k [gt(x)](f)d(G n -G )(x), 



(k-l)\ 
by linearity of TL k 



f f (x)d(G n -G )(x). 



TO 



Given x € [T,T 2fc _ 3 ], ff(x) = H k [g t {x)]{T)l [f ^ 2k _ 3] {x), where Hk[gt(x)]{r) is 
the value at f of the Hermite spline interpolant of the function 1 1— > gt(x) = 
(t — x)^ 1 /{k — 1)!. Thus ff(x) depends on the knots tq,. . . , r 2 fc_3 and the 
point s = f £ [To,T2fc_3] and can be viewed as an element of the class of 
functions 



(4.11) 



Fyo,R = {fs{x) = f ; 



S,yo,---,V2k-3 



(x):xe [yo,y2k-3],se [yo,y2k- 



xo-5<y <yi<---< y2k-3 < yo + R < xo + 5}, 



where 5 > is a fixed small number. In view of conjecture (1.7), together 
with the triangle inequality, there exists a constant C > depending only 
on k such that 



\fs(x)\<C(y 



2fc-3 



yo) 



k-l 



{yo,V2k 



and hence the collection !F yo r has envelope function F Vo r given by 



F yo , R {x) = CR k - 1 \ 



[yo,yo+R] 



(x). 



Furthermore, J~ VOi r is a VC-subgraph collection of functions (see Proposition 
A.l in the Appendix for a detailed argument) and hence by van der Vaart 
and Wellner [40], Theorem 2.6.7, page 141, 

supJV(e||F||Q j 2,^ 0j fl,L 3 (Q))< (y) \ 

for < £ < 1, where V k = 2{V{F y(h R) — 1) with V(!Fy 0t R) the VC-dimension 
of the collection of subgraphs and where the constant K depends only 
on V{J- yo tR ) [note that from our proof of Proposition A.l, it is clear that 
^(^yo,R) depends only of k]. It follows that 

sup / Jl + log N(£\\F yo , R \\Q 2 , F vo ,r, L 2 {Q)) de < oo. 
Q Jo v 
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On the other hand, 



ryo+R 

EF 2 oR (X 1 ) = C 2 R 2 ^ / g (x)dx<C 2 MR 2k - 1 



ryo+R 
' yo 

with M = go(xQ — 5). Application of Lemma A.l with d = k yields 

|En| = o p ((r 2fc _ 3 - r ) 2k ) + O p (n" 2fc /( 2fc+1 )). □ 

Lemma 4.5. If the bound in (1.8) holds, then M n of Lemma satisfies 

\^n\=O p ((T 2k _ 3 -T ) 2k ). 

Proof. By definition, M n is the value at f of the Hermite spline inter- 
polant of the function 



(4.12) ti 



_ W (tf>(u)-g^(f))du. 

By (1.8), there exists a constant D > depending only on k such that 
\M n \<D sup \g { k \t)-g { k) (f)\(T 2k _ 3 -T ) 2k . 

te[ro,T 2 fc_3] 

In the previous bound, we used the fact that the (2A;)-times derivative of 

(k) (k) _ 

the function in (4.12) is <? (i) — g v)- But, note that this derivative is 
o p (l), which follows from uniform continuity of g$ on compacts. This, in 
turn, implies the claimed bound. □ 

Proof of Lemma 3.1 for the LSE. Let j e {0, . . . , 2k — 4} be such 
that [Tj , Tjo+i] is the largest knot interval, that is, Tj +i — r, = max <j<2A;-4(Tj+i — 
Tj). Let a = tq, b = T 2 k-3- Using the inequality in (4.10) and noting that the 
bounds on R n and E n are independent of the choice of f in Uj=o 4 ( r i' r j+i); 
it follows that 

sup (-l) fc e fc (f) < O p {n~ 2k ^ 2k+l ^) + o p ((r 2fc _ 3 - r ) 2k ). 

T G( T j ' r J0 + 1 ) 

Now, on the interval [rj , Tj +i], the Hermite spline interpolant of the func- 
tion x 2k /(2k)\ reduces to a polynomial of degree 2k — 1. On the other hand, 
the best uniform approximation of the function x 2k on [rj , Tj +i] fr° m the 
space of polynomials of degree < 2k — 1 is given by the polynomial 

(4.13) x ~ x 2k - ( ^p^) "^-M 2 * = (TJQ + — 
V ^ / * V T io+i T io 
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where T2k is the Chebyshev polynomial of degree 2k (defined on [—1,1]); 
see, for example, Niirnberger [34], Theorem 3.23, page 46, or DeVore and 
Lorentz [9], Theorem 6.1, page 75. It follows that 



(4.14) sup (-1)%(t) > 

re ( T J0 > t j'o+ 1 ) 



2 4fc - 1 (2fc)! 
1 



( r io+i - T jo) 



2k 



2 ik - 1 (2k)\ 

since ||r 2 fe||oo = 1- But 



( r io+i - r io) 



2A- 



2fc-4 

T2k-3 ~r =J2 ( r i+ 1 ~ r i) - ( 2/c ~ 3 )( r io+i - T jo) 

3=0 



Hence, 



Combining the results obtained above, we conclude that 



2 k 



( l) fc 5o ^( x o) / x2fc s n / -2fc/(2fc+l)s , „ // ^ ^2fc\ 



(2/fc-3) 2fc 2 4fc - 1 (2fe)! 
which implies that T2k~3 — To = O p (n~ 1 /( 2fc+1 )). □ 

4.3. T7ie gap problem for the MLE. To show Lemma 3.1 for the MLE, 
one needs to deal with an extra difficulty posed by the nonlinear form of 
the characterization of this estimator as given in (2.6). In the following, we 
show how one can get around this difficulty. The main idea is to "linearize" 
the characterization of the MLE and hence be able to re-use the arguments 
developed for the LSE in the previous subsection. 

(k—l) 

Lemma 4.6. Let to, ■ ■ ■ , T2k-3 be 2k — 2 successive jump points of gn 
Then 

n k [Y n ] -Y n > go(ro)(f n - H k [f n ] + A n - H k [A n ]) 
on [To,T2k-3], where Y n is the same empirical process introduced in (2.2), 

Jto {k-ly- \g n {t) 9o{ro)J 

and 

rx ( r _ f\k-l / i i \ 

^n(x)^ \, ' (—--—-)d(G n (t)-G (t)). 
Jro (k-ly. \g n {t) 9o{t )J 
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Proof. Let G n (x) = Jq g n (s)ds. The characterization in (2.6) can be 
rewritten as 

4.15 / v - d(G n (t -G n * ~ ' ..' (fc -i) 

jo 5n(i) I = U, it x is a jump point ot ijn 

Note that when x is a jump point of gn , the two parts of (4.15) imply- 
that the first derivative of the function on the right-hand side is equal to 
at the jump point x, that is, 

(4.16) f (X 7^ 2 - G n (t)) = 0. 

JO 9n{t) 

For a; > 0, let 

{x-tf- 1 



rx r _ A 



o (* - 1)! 

Note that H n ^ H n defined in (2.5) and that on [to, 72^-3], H n is a spline of 
degree 2k — 1 with knots To, . . . ,T2k-3- For x 6 [To,T2fc— 3], we can write 

( x - 1)^- 1 

■d(G n (i)-G n (t)) 



(x-tJ^dfGnCtJ-G^t)) 



9n{t) 
1 

90 (to) 

+ f(x - t)*" 1 (J— - -L^) d(G n (t) - G n (i)) 
Jo \9n{t) go (To) J 

- {6 " (X) :Y X)) + f <* - *>" (H? - -h) d(<5 - (t) - c " (t,) 

9o(ro) Jo \9n{t) 50 (to)/ 

+ [\x - t)*' 1 (J- - -i^) d(G B (t) - Gb(t)) 
/r V5n(*) 50 (to)/ 

+ /""(x - t)^ 1 " -^t) d ( G oW - G„(t)) 

/to V5n(*) 50 (to)/ 

(tf„(x) - Y„(x)) + pn(x) - f n (x) - A„(x). 



5o (t ) 
Note that 



Pn (x) = H (x - if' 1 (-L - -i^) d(G re (i) - G n (t)) 

/0 VffnW 50 (To)/ 

is a polynomial of degree fc — 1. From (4.15) and (4.16), it follows that 
is the Hermite spline interpolant of the function 

Y„ + 5o(To){-Pn + jn + A n } 
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such that 

H n > Y n + ff (r )(-Pn + In + A n ). 

Hence, 

H k [Y n + g (T ){- Pn + f n + A n }} >Y n + g (T ){-p n + f n + A n } 
on [r ,r 2 fc_3] or, equivalently, 

H k [Y n ] -Y n > ff0 (r )(/n - n k [f n ] + A n - H k [A n }). □ 

Since H k [Y n ] — Y n has already been studied for the purposes of proving 
the order of the gap in the case of the LSE, the final step is to evaluate each 
of the interpolation errors 

(4.17) £i = f n -H k [f n ] and £ 2 = A n - H k [A n \. 

Lemma 4.7. Let E\ and £2 be the interpolation errors defined in (4-17). 
Then 

\\£l\\oo = O p ((T 2k -3 - T ) 2h ) 

and 

INloo = o P ((r 2fc _ 3 - r ) 2k ) + O p (n" 2fc /( 2fc+1 )). 

Proof. A detailed proof can be found in Balabdaoui and Wellner [4], 
Appendix 3. □ 

Proof of Lemma 3.1 FOR the MLE. From our study of the distance 
between the knots of the LSE, and using very similar calculations, we can 
show that for all f S U|^Q 4 (rj,rj + i), 

(-l) k g { k \f )(-l) k e k (f ) <E n + R n - <7o(t )(£i(t) + £ 2 (f )), 

which implies that by the results obtained for the LSE, 

D(r 2k . 3 - r ) 2k (l + op(l)) < O p (n~ 2fc /( 2fc+1 )) + ffo(ro)(||^i ||oc + INU) 

for some constant D > depending on k and xq. Hence, it follows from 
Lemma 4.7 that 

D{r 2k ^ - r ) 2fc (l + op(l)) < O p (n- 2fe /( 2fc+1 )), 

which yields the order n~ l l( 2k+l ^ for the distance between the knots of the 
MLE in the neighborhood of xq. □ 
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5. Conclusions and discussion. As noted in Section 1, one of the mo- 
tivations for this work was to try to approach the problem of pointwise 
limit theory for the MLE's in both the forward and inverse problems for the 
family of completely monotone densities on R + . This is one very important 
special case of the family of nonparametric mixture models with a smooth 
kernel as was mentioned in part (b) of our discussion in Section 1. Jewell 
[22] established consistency of the MLE's of g £ T)^ and the corresponding 
mixing distribution function F in this setting, but local rates of convergence 
and limiting distribution theory remain unknown. Our initial hope was that 
we might be able to learn about the problem with k = oo by studying the 
problem for fixed k and then taking limits as k — > oo. Unfortunately, we now 
believe that new tools and methods will be needed. The following discusses 
the state of affairs as we understand it now. 

In terms of rates of convergence and localization properties, our develop- 
ment here shows that the local behavior of the estimators near a fixed point 
xq > becomes dependent on an increasing number of jump points or knots 
in the spline problem. In other words, one needs to consider 2k — 2 consec- 
utive jump points (knots) tq )TI < • • • < r n ^k-3 of the (k — l)st derivative of 
the estimators in a neighborhood of xq in order to be able to find a bound 
on r n j+x — r n j,j = 0, . . . , 2k — 4, as n — ► oo. Thus the problem becomes in- 
creasingly "less local" with increasing k and this leads us to suspect that 
the situation in the k = oo (or completely monotone) problem might be only 
"weakly local" or perhaps even "completely nonlocal" in senses yet to be 
precisely defined. 

Another aspect of this problem is that although the MLE is asymptot- 
ically equivalent to the (mass-unconstrained) LSE for each fixed k if our 
conjecture (1.7) holds, they seem to differ increasingly as k increases. For 
k = 1, the MLE and the LSE are identical; for k = 2, the MLE differs from 
the (mass-unconstrained) LSE, but the LSE always has total mass 1. For 
k > 3, the MLE and LSE differ, and, moreover, the total amount of mass in 
the unconstrained LSE for n = 1 is M k = {{2k - l)/k)(l - I /{2k - l)) fc_1 / 
2e~ 1 / 2 ~ 1.21306 . . . ^ 1 as k — > oo. We do not know how the mass of the un- 
constrained LSE behaves jointly in n and k, even though (by consistency) 
the mass of the LSE converges to 1 as n — > oo for fixed k. We also do not 
even know if the unconstrained LSE exists for the scale mixture of exponen- 
tials, even though it is clear that the constrained estimator (defined by the 
least squares criterion minimized over T)^ rather than Mk) with mass 1 does 
exist. Since our current proof techniques rely so heavily on showing equiv- 
alence between the MLE and the (unconstrained) LSE, it seems likely that 
new methods will be required. We do not know if the (mass)-constrained 
LSE's and the MLE's are asymptotically equivalent either for finite k or 
for k = oo. Our current plan is to study the constrained LSE's with total 
mass constrained to be 1 for finite sample sizes in order to investigate the 
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asymptotic equivalence of these mass-constrained LSE's and the MLE's and 
to (perhaps) extend this study to k = oo via limits on k. We do not yet know 
the "right" Gaussian version of the estimation problem in the completely 
monotone case. 

Another way to view these difficulties might be to take the following 
perspective: since more knowledge is available concerning the MLE's for 
the families T>\~ with k finite and since X>oo is the intersection of all of the 
XV s (and hence well approximated by with k large), we can fruitfully 
consider estimation via model selection, choosing k based on the data, over 
the collection U^Li^ft- 

In summary, we have tried to shed some more light on the local behav- 
ior of two nonparametric estimators of a fc-monotone density, the Maximum 
Likelihood and Least Squares estimators. We have shown that they are both 
adaptive splines of degree k — 1 with knots determined by the data and their 
corresponding criterion functions. When (— l) k gQ k \xo) > 0, the distance be- 
tween their knots in a neighborhood of a point xo > was shown to be 
n -i/(2fc+i) jf a conjecture concerning the uniform boundedness of the in- 
terpolation error in a new Hermite interpolation problem holds. Once this 
control of the distance between the knots is available, pointwise limit dis- 
tribution theory follows via a route paralleling previous results for k = 1,2. 
Although we do not exclude the possibility that this order could be estab- 
lished via other approaches, we hope that the techniques developed here 
demonstrate that there could still be many interesting and powerful connec- 
tions between statistics and approximation theory. 

APPENDIX: PROOFS FROM EMPIRICAL PROCESSES THEORY 

The following proposition is a slight generalization of Lemma 4.1 of Kim 
and Pollard [24], page 201. 

Lemma A.l. Let J 7 be a collection of functions defined on [xq — 5, xq + 5] , 
with 5 > small. Suppose that for a fixed x E [xo — 5, xq + 5] and R > such 
that [x, x + R] C [xq — 5, xo + 5], the collection 

Fx,R = {fx,y = fl[x,y], f € ? , X < y < X + R} 

admits an envelope F x ^ such that 

EFl R (Xx) < KR 2d ~\ R < R , 

for some d > 1/2, where K > depends only on xq and 5. Moreover, suppose 
that 

(A.l) sup/" JlogN{i 1 \\F XyR \\Q, 2 ,F X}R ,L 2 (Q)) d v < 

Q Jo v 
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Then for each e > 0, there exist random variables M n of order O p (l) such 
that 

|(G n - G )(f x , y )\ < e\y - x\ k+d + n'^W^ M n 

(A.2) 

for \y-x\< R . 

Proof. By van der Vaart and Wellner [40], Theorem 2.14.1, page 239, 
it follows that 

(A.3) e{ sup \(G n -G )(f x , R )\Y <-EFl R (X 1 ) = 0(n- 1 R 2d ~ 1 ) 

for some constant K > depending only on xq, 5 and the entropy integral 
in (A.l). For any f x>y G T X)R , we write 

(Wn-Po)(f X ,y) = (Gn-G )(f x ,y) 

and define M n by 

M n = M{D > : |(P n - P )(f x , y )\ < e(y - x) k+d + n -(k+d)/(2k+i) ^ 

for all f x>y € F XtR ) 

and M n = oo if no D > satisfies the required inequality. For 1 < j < 

[Rn l ^ 2k+l ^ J = j n , we have 

P(M n > m) 

< P(|(P n - fl,)(/ x ,„)| > £(y - + n-( fe+d )/( 2fc+1 )m 

for some / Xj?/ G F x ,r) 

< P{n {k+d)/i2k+1) \(F n -P )(f x , y )\>e(j-l) k+d + m 

l<j<jn 

for some f x>y G (j - ljn" 1 /^ 1 ) < y - x < jn" 1 /^ 1 )} 

< n 2(k+d)/(2k+l) E i SU Py:0<y-x<jn-^(^) I ( P » ~ P o) (/x,y-s) |} 2 
~ 1 ^ b ( £(i -l)^ + m )2 



E 



(e(j-l) k + d + my 



<c E 



n 2(fc+d)/(2fe+l) n -l n "(2<i-l)/(2fc+l) I 



2d-l 



(e(j-l) k + d + my 

■2d-l oo „-2d-l 

ihu (e(j ' " l)k+d + 771)2 " {£{j ~ l)k+d + m)2 
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as m y oo, where C > is a constant that depends only on xq, 5. Therefore, 
it follows that (A.2) holds. □ 

In the following, we present VC-subgraph proofs for Lemma 4.4. 

Proposition A.l. For k>2, the class of functions J~y ,R given in 
(4-11) is a VC-subgraph class. 

Proof. We first show that the class of subgraphs 

C = {{(t,c) £l + x R:c< f t (x)}: 

x G [To,T 2k -3\,x - 5 < y < y 1 < ■ •• < y 2 k~3 <Vo + R<x + S} 

is a VC class of sets in R + x R. If we show this, then the class of functions 
(4.11) is VC-subgraph. Alternatively, from van der Vaart and Wellner [40], 
problem 11, page 152, it suffices to show that the "between graphs" 

d = {{(t,c)el + x M:0<c< f t (x) or f t (x) < c < 0} : 

x g [yo,V2k-z],XQ - 5 < y < yi < ••• <y 2 k-3 <yo + R<x + 5} 

constitute a VC class of sets. Let 

dj = {{(t, c)el + xl:0<c< ftWlfo-uVjffl 

or f t (x)l [y ._ uyj] (t)<c<0}: 
x G [t , r 2 fc_ 3 ], x - 5 < y < y 1 < ■ ■ ■ < y 2 fc-3 < Vo + R < x + 6} 

for j = 1, . . . , 2k — 3. Since 1 1— > ft(x)l[ yj _ uy .] (t) is a polynomial of degree at 
most k — 1 for each j = 1, . . . , k, the classes C\a are all VC classes. Also, note 
that 

C\ C Ci,i U • • • U Ci^k-3 = Cuk- 

By Dudley [10], Theorem 2.5.3, page 153, C U k is a VC class (or see van der 
Vaart and Wellner [40], Lemma 2.6.17, part (hi), page 147). Hence, C\ is a 
VC class and F Vo ,r is a VC-subgraph class. □ 
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